window size
- Research Report > Experimental Study (0.47)
- Research Report > New Finding (0.46)
Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing
Wang, Chao, Li, Xuanying, Dai, Cheng, Feng, Jinglei, Luo, Yuxiang, Ouyang, Yuqi, Qin, Hao
Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception.
Short window attention enables long-term memorization
Cabannes, Loïc, Beck, Maximilian, Szilvasy, Gergely, Douze, Matthijs, Lomeli, Maria, Copet, Jade, Mazaré, Pierre-Emmanuel, Synnaeve, Gabriel, Jégou, Hervé
Recent works show that hybrid architectures combining sliding window softmax attention layers with linear recurrent neural network (RNN) layers outperform both of these architectures taken separately. However, the impact of the window length and the interplay between softmax attention and linear RNN layers remain under-studied. In this work, we introduce SWAX, a hybrid architecture consisting of sliding-window attention and xLSTM linear RNN layers. A counter-intuitive finding with SWAX is that larger sliding windows do not improve the long-context performance. In fact, short window attention encourages the model to better train the long-term memory of the xLSTM, by relying less on the softmax attention mechanism for long context-retrieval. The issue with small sliding windows is that they are detrimental for short-context tasks, which could be solved with information from moderately larger sliding windows otherwise. Therefore, we train SWAX by stochastically changing the sliding window size, forcing the model to leverage both a longer context window and the xLSTM memory. SWAX trained with stochastic window sizes significantly outperforms regular window attention both on short and long-context problems.
Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks
Shen, Huanming, Huang, Baizhou, Wan, Xiaojun
Watermarking is a promising defense against the misuse of large language models (LLMs), yet it remains vulnerable to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work breaks this trade-off by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a novel watermark scheme with Sub-vocabulary decomposed Equivalent tExture Key (SEEK). It achieves a Pareto improvement, increasing the resilience against scrubbing attacks without compromising robustness to spoofing. Experiments demonstrate SEEK's superiority over prior method, yielding spoofing robustness gains of +88.2%/+92.3%/+82.0% and scrubbing robustness gains of +10.2%/+6.4%/+24.6% across diverse dataset settings.
- Europe > France (0.27)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- (10 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Security & Privacy (1.00)
- Leisure & Entertainment > Sports > Football (0.45)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
JaGuard: Jamming Correction of GNSS Deviation with Deep Temporal Graphs
Kesić, Ivana, Blatnik, Aljaž, Fortuna, Carolina, Bertalanič, Blaž
Abstract--Global Navigation Satellite Systems (GNSS) face growing disruption from intentional jamming, undermining availability exactly when reliable positioning and timing are essential. We tackle this challenge by recasting jamming mitigation as a dynamic graph regression problem and propose a Jamming Guardian (JaGuard), a new receiver-centric deep temporal graph network-based method that estimates, and thereby corrects, the receiver's latitude and longitude errors. At each 1 Hz epoch, we model the satellite-receiver scene as a heterogeneous star graph with the receiver as the center node and the tracked satellites as leaves. These satellites have time-varying attributes such as SNR, azimuth, elevation, and latitude/longitude. A single-layer Heterogeneous Graph ConvLSTM (HeteroGCLSTM) fuses one-hop spatial context with short-term temporal dynamics to produce a 2D deviation vector for error mitigation. We evaluate our approach on datasets collected from physical hardware (two different commercial receivers), subjected to controlled conducted RF interference. Interference is introduced with three jammer types: Continuous Wave CW, multi-tone 3 CW, and wideband FM. Each jammer type was exercised at six power levels from 45 to 70 dBm, with 50 repetitions per scenario, including pre-jam, jam, and recovery phases. Compared to strong multivariate time series baselines (TSMixer MLP, uniform CNN, and Seq2Point CNN), our model consistently yields the lowest Mean Absolute Error (MAE) in positional deviation. Under severe jamming at 45 dBm, it achieves an MAE of 3.64-7.74 On mixed-mode datasets that pool all power levels, the MAE is 3.78 cm for GP01 and 4.25 cm for U-blox 10, surpassing Seq2Point, TSMixer, and uniform CNN. A data-efficiency split further shows that with only 10% of the training data, our approach remains clearly ahead, achieving an MAE of about 20 cm versus 36-42 cm for the baselines. Global Navigation Satellite Systems (GNSS) underpin nearly every critical infrastructure, from telecommunications [1] and aviation safety [2], power-grid synchronization [3], emerging drone ecosystems where location privacy and integrity are paramount [4], to autonomous driving [5].
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Asia > India (0.04)
- Transportation (0.68)
- Information Technology (0.66)
Hierarchical geometric deep learning enables scalable analysis of molecular dynamics
Pengmei, Zihan, Guo, Spencer C., Lorpaiboon, Chatipat, Dinner, Aaron R.
Molecular dynamics simulations can generate atomically detailed trajectories of complex systems, but analyzing these dynamics can be challenging when systems lack well-established quantitative descriptors (features). Graph neural networks (GNNs) in which messages are passed between nodes that represent atoms that are spatial neighbors promise to obviate manual feature engineering, but the use of GNNs with biomolecular systems of more than a few hundred residues has been limited in the context of analyzing dynamics by both difficulties in capturing the details of long-range interactions with message passing and the memory and runtime requirements associated with large graphs. Here, we show how local information can be aggregated to reduce memory and runtime requirements without sacrificing atomic detail. We demonstrate that this approach opens the door to analyzing simulations of protein-nucleic acid complexes with thousands of residues on single GPUs within minutes. For systems with hundreds of residues, for which there are sufficient data to make quantitative comparisons, we show that the approach improves performance and interpretability.
- North America > United States > New York > New York County > New York City (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (2 more...)
Run-Time Monitoring of ERTMS/ETCS Control Flow by Process Mining
Vitale, Francesco, Zoppi, Tommaso, Flammini, Francesco, Mazzocca, Nicola
Ensuring the resilience of computer-based railways is increasingly crucial to account for uncertainties and changes due to the growing complexity and criticality of these systems. Although their software relies on strict verification and validation processes following well-established best-practices and certification standards, anomalies can still occur at run-time due to residual faults, system and environmental modifications that were unknown at design-time, or other emergent cyber-threat scenarios. This paper explores run-time control-flow anomaly detection using process mining to enhance the resilience of ERTMS/ETCS L2 (European Rail Traffic Management System / European Train Control System Level 2). Process mining allows learning the actual control flow of the system from its execution traces, thus enabling run-time monitoring through online conformance checking. In addition, anomaly localization is performed through unsupervised machine learning to link relevant deviations to critical system components. We test our approach on a reference ERTMS/ETCS L2 scenario, namely the RBC/RBC Handover, to show its capability to detect and localize anomalies with high accuracy, efficiency, and explainability.
- Workflow (0.68)
- Research Report (0.64)
- Transportation > Ground > Rail (1.00)
- Information Technology > Security & Privacy (1.00)
DSD: A Distributed Speculative Decoding Solution for Edge-Cloud Agile Large Model Serving
Yu, Fengze, Li, Leshu, McDanel, Brad, Zhang, Sai Qian
Large language model (LLM) inference often suffers from high decoding latency and limited scalability across heterogeneous edge-cloud environments. Existing speculative decoding (SD) techniques accelerate token generation but remain confined to single-node execution. We propose DSD, a distributed speculative decoding framework that extends SD to multi-device deployments through coordinated draft-target execution. Given the lack of prior work on simulating this paradigm, we first introduce DSD-Sim, a discrete-event simulator that captures network, batching, and scheduling dynamics. Building on insights from DSD-Sim, we further design an Adaptive Window Control (AWC) policy that dynamically adjusts speculation window size to optimize throughput. Experiments across diverse workloads show that DSD achieves up to 1.1x speedup and 9.7% higher throughput over existing SD baselines, enabling agile and scalable LLM serving across edge and cloud.
HIMOSA: Efficient Remote Sensing Image Super-Resolution with Hierarchical Mixture of Sparse Attention
Liu, Yi, Wan, Yi, Liu, Xinyi, Wu, Qiong, Xia, Panwang, Huang, Xuejun, Zhang, Yongjun
In remote sensing applications, such as disaster detection and response, real-time efficiency and model lightweighting are of critical importance. Consequently, existing remote sensing image super-resolution methods often face a trade-off between model performance and computational efficiency. In this paper, we propose a lightweight super-resolution framework for remote sensing imagery, named HIMOSA. Specifically, HIMOSA leverages the inherent redundancy in remote sensing imagery and introduces a content-aware sparse attention mechanism, enabling the model to achieve fast inference while maintaining strong reconstruction performance. Furthermore, to effectively leverage the multi-scale repetitive patterns found in remote sensing imagery, we introduce a hierarchical window expansion and reduce the computational complexity by adjusting the sparsity of the attention. Extensive experiments on multiple remote sensing datasets demonstrate that our method achieves state-of-the-art performance while maintaining computational efficiency.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hubei Province > Wuhan (0.04)
Block Cascading: Training Free Acceleration of Block-Causal Video Models
Bandyopadhyay, Hmrishav, Pinnaparaju, Nikhil, Entezari, Rahim, Scott, Jim, Song, Yi-Zhe, Jampani, Varun
Block-causal video generation faces a stark speed-quality trade-off: small 1.3B models manage only 16 FPS while large 14B models crawl at 4.5 FPS, forcing users to choose between responsiveness and quality. Block Cascading significantly mitigates this trade-off through training-free parallelization. Our key insight: future video blocks do not need fully denoised current blocks to begin generation. By starting block generation with partially denoised context from predecessors, we transform sequential pipelines into parallel cascades where multiple blocks denoise simultaneously. With 5 GPUs exploiting temporal parallelism, we achieve ~2x acceleration across all model scales: 1.3B models accelerate from 16 to 30 FPS, 14B models from 4.5 to 12.5 FPS. Beyond inference speed, Block Cascading eliminates overhead from KV-recaching (of ~200ms) during context switches for interactive generation. Extensive evaluations validated against multiple block-causal pipelines demonstrate no significant loss in generation quality when switching from block-causal to Block Cascading pipelines for inference. Project Page: https://hmrishavbandy.github.io/block_cascading_page/